Conference Proceedings

Automatic webpage briefing

Y Dai, R Zhang, J Qi

Proceedings International Conference on Data Engineering | IEEE COMPUTER SOC | Published : 2021

Abstract

We introduce the task of webpage briefing (WB) to provide a summary of a webpage in a hierarchical manner, from the broad topic of the webpage, to finer level key attributes. A straightforward approach for this task is to train a machine learning model for generating topics and extracting key attributes. However, such a model may not perform well on webpages that are from domains not seen in the training data. An ideal model should be able to adapt to unseen domains while preserving knowledge learned from the seen domains. Knowledge distillation (KD) offers a potential solution, in which a teacher pre-trained with specific domains can pass the knowledge to a student, while unseen domains can..

View full abstract

University of Melbourne Researchers